2023 Spatial Data Analysis
1 INTRODUCTION
The tidycensus
packages offers a set of functions to retrieve census and American Community Survey data. Fortunately, the package offers a wide array of options for retrieving census data and American Community Survey via the Census API. We obtain the data through the get_acs
function which contains geometry data for the American Community Survey (2015 - 2019) dataset.
It may be worthwhile to add an progress_bar = FALSE
argument to a get_acs
function call, especially, working within a RMarkdown or Quarto document. This way, one can avoid progress bar printing when the document is rendered.
1.1 EXPLORATORY ANALYSIS: DISSIMILARITY
With the spatial data in hand, we can explore the data more in-depth. Here, the segregation
package offers a dissimilarity index function (conveniently named dissimilarity
). The function returns a total segregation between a group and unit using the Index of Dissimilarity Elbers (2023) . Importantly, the dissimilarity index considers differences between two distinct groups. As the first step, we conduct dissimilarity between Hispanic and White residents in San Francisco–Oakland.
see code
stat est
<char> <num>
1: D 0.5135526
To add context on the dissimilarity index above, we can compare regional. Below, we split the data by urban name and apply the function across those groups and finally combine the outputs. The approach below is slightly different to that contained in the book. The book offers a more tidy and succinct method.
see code
Group_Wise <- split(ca_urban_data,ca_urban_data$urban_name)
Group_Wise <- lapply(Group_Wise,function(x){x |>
filter(variable %in% c("white","hispanic"))%>%
dissimilarity(group ="variable",
unit = "GEOID",
weight = "estimate")})
Group_Wise <- do.call(bind_rows,map2(Group_Wise,names(Group_Wise),function(x,y){
x |>
mutate(urban_name = y)
}))
Across urban areas, Los Angeles –Long Beach, has the highest dissimilarity index at 0.599. The dissimilarity index ranges from 0 - 1 where 0 represents perfect integration between two groups and 1 represents complete segregation (Walker 2023, 215). The table below provides some context compared to our earlier dissimilarity index value for San Francisco-Oakland.
see code
urban_name | stat | est |
---|---|---|
Los Angeles--Long Beach--Anaheim, CA Urbanized Area (2010) | D | 0.5999229 |
San Francisco--Oakland, CA Urbanized Area (2010) | D | 0.5135526 |
San Jose, CA Urbanized Area (2010) | D | 0.4935633 |
San Diego, CA Urbanized Area (2010) | D | 0.4898184 |
Riverside--San Bernardino, CA Urbanized Area (2010) | D | 0.4079863 |
Sacramento, CA Urbanized Area (2010) | D | 0.3687927 |
Among the urban areas Los Angeles and San Francisco are the most segregated areas among Hispanic and White residents. While, San-Bernardino and Sacramento had the least among of segregation. We can expand on the dissimilarity index by considering more than two groups. Again, we rely on the segregation
package’s implementation of the Mutual Information Index and Theil’s Entropy Index. The latter indices measure diversity and segregation across multiple groups (Walker 2023, 217) in California urban areas.
see code
mutual_within(data = ca_urban_data,
group = "variable",
unit = "GEOID",
weight = "estimate",
within = "urban_name",
wide = TRUE) |>
arrange(desc(H)) |>
gt() |>
cols_label(
urban_name = "URBAN NAME",
M = "M Index (M)",
H = "H Index (H)",
p = "Proportion of the category (p)",
ent_ratio = "Entropy Ratio"
) |>
gt_theme_espn()
URBAN NAME | M Index (M) | Proportion of the category (p) | H Index (H) | Entropy Ratio |
---|---|---|---|---|
Los Angeles--Long Beach--Anaheim, CA Urbanized Area (2010) | 0.3391033 | 0.50163709 | 0.2851662 | 0.9693226 |
San Francisco--Oakland, CA Urbanized Area (2010) | 0.2685992 | 0.13945223 | 0.2116127 | 1.0346590 |
San Diego, CA Urbanized Area (2010) | 0.2290891 | 0.12560720 | 0.2025728 | 0.9218445 |
San Jose, CA Urbanized Area (2010) | 0.2147445 | 0.07282785 | 0.1829190 | 0.9569681 |
Sacramento, CA Urbanized Area (2010) | 0.1658898 | 0.07369482 | 0.1426804 | 0.9477412 |
Riverside--San Bernardino, CA Urbanized Area (2010) | 0.1497129 | 0.08678082 | 0.1408461 | 0.8664604 |
The results of the multi-group dissimilarity index are largely similar with Los Angeles remaining the most segregated urban area in California. Los Angeles is large area, hence, it may be worthwhile to extend to local analysis. Local analysis is a more granular approach to understanding the differences.
see code
la_local_seg <- ca_urban_data %>%
filter(str_detect(urban_name,"Los Angeles")) %>%
mutual_local(
group = "variable",
unit = "GEOID",
weight = "estimate",
wide = TRUE
)
la_tracts_seg <- tracts("CA", cb = TRUE, year = 2019) %>%
inner_join(la_local_seg, by = "GEOID")
tmap_mode("view")
tm_shape(la_tracts_seg) +
tm_borders("black",lwd = .5)+
tm_polygons("ls",
palette = "viridis",
title = "Local\nsegregation index")